15 research outputs found
Small in-distribution changes in 3D perspective and lighting fool both CNNs and Transformers
Neural networks are susceptible to small transformations including 2D
rotations and shifts, image crops, and even changes in object colors. This is
often attributed to biases in the training dataset, and the lack of 2D
shift-invariance due to not respecting the sampling theorem. In this paper, we
challenge this hypothesis by training and testing on unbiased datasets, and
showing that networks are brittle to both small 3D perspective changes and
lighting variations which cannot be explained by dataset bias or lack of
shift-invariance. To find these in-distribution errors, we introduce an
evolution strategies (ES) based approach, which we call CMA-Search. Despite
training with a large-scale (0.5 million images), unbiased dataset of camera
and light variations, in over 71% cases CMA-Search can find camera parameters
in the vicinity of a correctly classified image which lead to in-distribution
misclassifications with < 3.6% change in parameters. With lighting changes,
CMA-Search finds misclassifications in 33% cases with < 11.6% change in
parameters. Finally, we extend this method to find misclassifications in the
vicinity of ImageNet images for both ResNet and OpenAI's CLIP model
Learning Visual Importance for Graphic Designs and Data Visualizations
Knowing where people look and click on visual designs can provide clues about
how the designs are perceived, and where the most important or relevant content
lies. The most important content of a visual design can be used for effective
summarization or to facilitate retrieval from a database. We present automated
models that predict the relative importance of different elements in data
visualizations and graphic designs. Our models are neural networks trained on
human clicks and importance annotations on hundreds of designs. We collected a
new dataset of crowdsourced importance, and analyzed the predictions of our
models with respect to ground truth importance and human eye movements. We
demonstrate how such predictions of importance can be used for automatic design
retargeting and thumbnailing. User studies with hundreds of MTurk participants
validate that, with limited post-processing, our importance-driven applications
are on par with, or outperform, current state-of-the-art methods, including
natural image saliency. We also provide a demonstration of how our importance
predictions can be built into interactive design tools to offer immediate
feedback during the design process
Additional file 3: Table S3. of Exploiting the recognition code for elucidating the mechanism of zinc finger protein-DNA interactions
Checking top predictions (top 50) to establish relationship between Approach 2 (consensus amino acids and synergistic binding mode) and Approach 1 (consensus amino acid and modular binding mode) prediction for all 16 GNN targets. Our Approach 2 predictions for Finger 3 coincide with the Approach 1 predictions. (DOCX 23 kb
Additional file 1: Table S1. of Exploiting the recognition code for elucidating the mechanism of zinc finger protein-DNA interactions
Detailed analysis of predictions against experimental data for Approaches 1 and 3 for all 16 GNN triplets for different finger (1, 2 or 3 positions). Experimental data involves target DNA sequence which binds to its respective helix3 on the ZFP at various positions (finger 1, finger 2 & finger 3) with the corresponding Kd values for determining experimental affinity between DNA and its respective ZFP. Approach 3 (all possible amino acids and modular binding mode) gives its top helix for the experimental DNA target with its rank and score respectively. Approach 1 (consensus amino acid and modular binding mode) gives its top helix for the experimental DNA target with its rank and IHBE score respectively. Approach 1 predicts zinc finger helices for experimental DNA targets accurately for Finger 3 followed by Finger 1, whereas Approach 3 does so for Finger 2. (DOCX 56 kb
Effects of title wording on memory of trends in line graphs
International audienceGraphs and data visualizations can give us a visual sense of trends on topics ranging from poverty, the spread of diseases, the popularity of products, etc. What makes graphs useful is our ability to perceive these trends at-a-glance. Related work has investigated the effect of different properties of graphs, including axis scaling, the choice of encoding, and the presence of pictographic elements (e.g., Haroz et al. 2015) on the perception of trends or remembered size of the quantities depicted. Previous work has shown that visual attention is directed towards the text and specifically titles, which can affect what is recalled from memory (Borkin, Bylinskii, et al. 2016; Matzen et al. 2017). In a more controlled setting, we investigate how wording in a line graph's title impacts memory of the trend's slope. We designed a set of experiments that consist of first showing participants a simple graph with an increasing or decreasing trend, paired with a title that is either strongly stated ("Contraceptive use in Senegal skyrockets") or more neutral ("Contraceptive use in Senegal rises"). To avoid rehearsal, participants then performed a challenging task, before being asked to recall the title and answer a question about the graph's initial/final value or an extrapolated value. Can we change a participant's memory of a graph by modifying some accompanying text? These experiments bear resemblance to the eyewitness testimony experiments by Loftus et al. (1996). In some conditions, the strength of the wording in the title affects how participants recall the trend from memory, but this effect is not universal across experiments. Results of these experiments have important implications for how text interacts with long term visual memory and may bias future inferences.ReferencesHaroz, S., Kosara, R., & Franconeri, S. L. (2015). Isotype visualization: Working memory, performance, and engagement with pictographs. In Proceedings of the 33rd annual ACM conference on human factors in computing systems. DOI: https://doi.org/10.1145/2702123.2702275Borkin, M. A., Bylinskii, Z., Kim, N. W., Bainbridge, C. M., Yeh, C. S., Borkin, D., ... & Oliva, A. (2016). Beyond memorability: Visualization recognition and recall. IEEE transactions on visualization and computer graphics, 22(1), 519-528
On the Capability of Neural Networks to Generalize to Unseen Category-Pose Combinations
Recognizing an object’s category and pose lies at the heart of visual understanding. Recent works suggest that deep neural networks (DNNs) often fail to generalize to category-pose combinations not seen during training. However, it is unclear when and how such generalization may be possible. Does the number of combinations seen during training impact generalization? Is it better to learn category and pose in separate networks, or in a single shared network? Furthermore, what are the neural mechanisms that drive the network’s generalization? In this paper, we answer these questions by analyzing state-of-the-art DNNs trained to recognize both object category and pose (position, scale, and 3D viewpoint) with quantitative control over the number of category-pose combinations seen during training. We also investigate the emergence of two types of specialized neurons that can explain generalization to unseen combinations—neurons selective to category and invariant to pose, and vice versa. We perform experiments on MNIST extended with position or scale, the iLab dataset with vehicles at different viewpoints, and a challenging new dataset for car model recognition and viewpoint estimation that we introduce in this paper, the Biased-Cars dataset. Our results demonstrate that as the number of combinations seen during training increases, networks generalize better to unseen category-pose combinations, facilitated by an increase in the selectivity and invariance of individual neurons. We find that learning category and pose in separate networks compared to a shared one leads to an increase in such selectivity and invariance, as separate networks are not forced to preserve information about both category and pose. This enables separate networks to significantly outperform shared ones at predicting unseen category-pose combinations.This material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216